302 PART 6 Analyzing Survival Data
In this chapter, we explain how survival data aren’t like ordinary numerical data
and why you need to use specific techniques to analyze them properly. We describe
two ways to construct survival curves: the life-table and the Kaplan-Meier meth-
ods. We guide you in preparing and interpreting survival curves and show you how
to glean useful information from these curves, such as median survival time and
five-year survival rates.
Understanding the Basics of Survival Data
To understand survival analysis, you first have to understand survival data. Sur-
vival times are intervals between a designated starting time point and the time
point an event occurs. These intervals have can have a specific type of missing
data due to a phenomenon called censoring. Because survival data usually include
censored data, they must be analyzed in a very specific way to avoid generating
biased estimates that lead to incorrect conclusions.
Examining how survival times are intervals
The techniques described in this chapter for summarizing, graphing, and com-
paring survival data deal with the time interval from a defined starting point to
the first occurrence of an endpoint event. The event can be designated as death or
a relapse of a particular condition, such as a recurrence of cancer. Or you could
designate the event to be surgical removal (called an explant) of a failed mechani-
cal component, such as an artificial heart valve. If a patient’s heart valve was
implanted on January 10 (beginning of time interval), but their body rejected it
and the explant took place on January 30 (time of event), then the time interval
from implant to explant is 30 – 10, or 20 days.
A person can die only once, so survival analysis can obviously be used for one-
time events. But other endpoints can occur multiple times, such as having a stroke
or having cancer go into remission. The techniques we describe in this chapter
only analyze time to the first occurrence of the event. More advanced survival
analysis methods are needed for models that can handle multiple occurrences of
an event, and these are beyond the scope of this book.
The starting point of the time interval is somewhat arbitrary, so it must be defined
explicitly every time you do a survival analysis. Imagine that you’re studying the
progression of chronic obstructive pulmonary disease (COPD) in a group of
patients. If you want to study the natural history of the disease, the starting point
can be the diagnosis date. But if you’re instead interested in evaluating the
efficacy of a treatment, the starting point can be defined as the date the
treatment began.